← Back to Services

Auto Scaling

Priority Tier 2 Domain 2: Design Resilient Architectures Domain 3: Design High-Performing Architectures Domain 4: Design Cost-Optimized Architectures

EC2 Auto Scaling automatically adjusts the number of EC2 instances in a group based on demand, ensuring high availability, fault tolerance, and cost efficiency by dynamically launching or terminating instances. It integrates with Elastic Load Balancing (ELB) for traffic distribution and CloudWatch for monitoring to respond to real-time metric changes or predictable schedules (source_page: 1, 6, 7).

Learning Objectives

Introduction to EC2 Auto Scaling

Amazon EC2 Auto Scaling is a service that automatically adjusts the number of EC2 instances within an Auto Scaling Group (ASG) to meet application demand.

An Auto Scaling group is a collection of EC2 instances treated as a logical grouping for the purpose of auto-scaling and management. It manages EC2 instances automatically, adjusting the number of servers based on application traffic.
ASGs manage EC2 instances automatically, adjusting the number of servers based on application traffic. This includes adding instances during increasing traffic and reducing instances during decreasing traffic.
Auto Scaling provides high availability and cost optimization.
Scaling operations can be performed based on metrics like CPU utilization or network traffic.
Auto Scaling Groups work with Launch Templates or Launch Configurations.
Using Amazon EC2 Auto Scaling provides several key benefits: Fault tolerance (handles unhealthy instances by replacing them), High availability (ensures sufficient capacity to meet demand), Performance (distributes workload for optimal response times), and Cost optimization (pays only for resources when needed).

Auto Scaling Group Attributes (Capacity Management)

Auto Scaling Group attributes define the boundaries and desired state for the number of instances.

These attributes control the number of EC2 instances in your Auto Scaling group, managing capacity based on demand and cost considerations.

Minimum Capacity

The lowest number of EC2 instances your Auto Scaling group will ever run. Its purpose is to ensure baseline capacity and prevent application outages during off-peak hours when demand drops.
constraint: Ensures baseline capacity.
Use Cases:
  • Maintaining baseline capacity for critical applications.

Desired Capacity

The target number of EC2 instances an Auto Scaling group aims to maintain at a given point in time for steady-state workload. It can be set manually or adjusted automatically through scaling policies. Cannot be lower than minimum capacity.
constraint: Cannot be lower than minimum capacity.
Use Cases:
  • Target number of instances for steady-state workloads.

Maximum Capacity

The highest number of EC2 instances your Auto Scaling group will allow. Its purpose is to control costs and prevent overprovisioning by limiting instance launches even during extreme traffic spikes.
constraint: Limits instance launches.
Use Cases:
  • Preventing overprovisioning and controlling costs during extreme traffic spikes.

Scaling Types and Concepts

Scaling refers to the ability to adjust compute capacity to match fluctuating demand.

Horizontal scaling involves adding more instances to handle the load (scale out/in). Vertical scaling involves increasing the power or resources of a single instance (scale up/down). Horizontal scaling increases high availability and redundancy. The cloud offers greater flexibility for proactive and dynamic scaling.
Method: Adding more identical EC2 instances rather than increasing the size of a single instance. Use Cases: Stateless applications (e.g., web servers) where each instance can process requests independently. Benefits: Highly effective in cloud environments, can be performed without downtime, ideal for stateless applications.
Method: Adding more CPU and RAM to an existing EC2 instance (e.g., upgrading from a T3 small to an M5 2X Large). Use Cases: Applications that cannot be easily split, monolithic applications, tightly coupled components. Limitations: Limited by hardware capacity, often requires brief downtime.
Scaling In means decreasing the number of instances in the group. Scaling Out means increasing the number of instances in the group.
The status of an instance (healthy or unhealthy), determined through EC2 status checks, Elastic Load Balancing (ELB) health checks, or custom health checks. Unhealthy instances are terminated and replaced.
ASG handles unhealthy instances by replacing them, ensuring continuous service even during failures. Deploying applications across multiple Availability Zones (AZs) and running applications on multiple EC2 instances across at least two AZs minimizes single points of failure.
Thrashing occurs when instances are constantly added and removed due to rapid fluctuations in metrics. To avoid this, configure alarms to trigger only after a sustained period of metric breach (Alarm sustain period), or pause scaling activities for a specified period after a scaling event (Cooldown period).

Launch Templates vs. Launch Configurations

comparison-table

AWS offers two methods for defining EC2 instance configurations for Auto Scaling Groups.

Launch Templates are the newer, recommended method for defining instance configurations, offering more flexibility and features compared to the older Launch Configurations.

Option Status Editability Versioning Supported Features AWS Recommendation Networking Information
Launch Configurations Older, deprecated approach (since December 31, 2023) Cannot be edited after creation; requires creating a new one for updates. No Supports only basic parameters; lacks advanced features. Not recommended for new deployments. Certification Note: Choose Launch Templates over Launch Configurations in certification questions. Do not include any networking information. Default value for instance placement tenancy is null and instance tenancy is controlled by VPC.
Launch Templates Newer, AWS-recommended method Flexible, supports multiple versions for easy updates. Yes, supports multiple versions. Supports advanced options: Spot instances, T2/T3 unlimited instances, Elastic Graphics, multiple instance types in a single ASG, on-demand, and spot capacity in the same target group, Dedicated Hosts. AWS-recommended best practice. Networking information cannot be included if they will be used in auto scaling groups.

Launch Templates

Launch templates define the configuration details for launching EC2 instances within an Auto Scaling Group.

They are a collection of settings needed to spin up an EC2 instance and are the AWS best practice for specifying instance configurations.

Amazon Machine Image (AMI)

Choose a valid AMI to ensure that the instances will have the necessary operating system and software configuration.

Instance Type

Specify the instance type that best meets the performance requirements of your application, considering CPU, memory, storage, and network performance.

Key Pair

Select a key pair to enable secure SSH access to your instances.

Security Groups

Attach one or more security groups to control inbound and outbound traffic to your instances.

User Data

Include user data scripts to automate the configuration of your instances during launch (e.g., installing HTTPD, starting services, creating HTML content).

Instance Profile

Specify an IAM role to grant instances the necessary permissions to access AWS services.

Network Configuration

Define the subnet and VPC settings. Note that networking information cannot be included in the launch template if it will be used in auto-scaling groups, as this is typically configured at the ASG level.

Versioning

Utilize template versioning to manage different configurations and ensure consistency across deployments.

Advanced Options

Configure additional settings such as placement groups, tenancy (dedicated or shared), and capacity reservations. Launch templates must be used for Amazon EC2 Dedicated Hosts.

Health Checks

Ensure that health check settings are properly configured to maintain the health of your auto-scaling group.

Cost Management

Consider using reserved instances or spot instances to optimize costs.

Scaling Policies

Scaling policies are rulebooks that dictate when and how to scale EC2 instances.

Auto Scaling provides several options to adjust capacity, from manual adjustments to machine learning-driven forecasts.

Target Tracking Scaling

Most common and easiest to set up. You set a target metric (e.g., average CPU utilization at 50%). Auto Scaling adjusts instance count to stay near the target, similar to a thermostat.
analogy: Like a thermostat maintaining a constant temperature.
Use Cases:
  • Maintaining a specific metric at a target value.

Step Scaling

Scaling happens in steps based on how far a metric is from a threshold. Allows for more granular control. For example, if CPU > 60%, add 1 instance; if CPU > 80%, add 2 instances.
control: Granular control based on metric deviation.
Use Cases:
  • Adjusting capacity based on the severity of a metric breach.

Simple Scaling

Older method with a single threshold and action. Less flexible as it can only perform one action per alarm. For example, if CPU > 70% for 5 minutes, add 1 instance. After a scaling action, it waits for a cooldown period.
action_per_alarm: One action per alarm.
cooldown_period: Honors cooldown periods.
Use Cases:
  • Simple, fixed adjustments to capacity.

Scheduled Scaling

Allows you to plan scaling actions in advance based on predictable traffic patterns. For example, increase capacity on weekdays from 9 AM to 5 PM.
trigger: Predefined schedule.
Use Cases:
  • Accommodating predictable load changes.

Predictive Scaling

Uses machine learning to forecast future traffic based on historical data and schedules scaling actions proactively, preventing scaling issues. It is more flexible and can adapt to changes in traffic patterns.
method: Machine learning to forecast load.
Use Cases:
  • Proactively adjusting capacity for unpredictable or complex load patterns.

Manual Scaling

Manually adjust the minimum, maximum, and desired capacity of the Auto Scaling Group. You can also manually scale up or down with the AWS console.
Use Cases:
  • Direct control over capacity.

Maintain Current Instance Levels

Ensures that no instances are added unless an instance fails its health checks and needs to be restarted or replaced.
Use Cases:
  • Ensuring a constant number of instances unless health checks fail.

Auto Scaling Group Workflow and Components

Auto Scaling Groups integrate with various AWS services to monitor and manage EC2 instance fleets.

Load balancers (ALB, NLB, Classic Load Balancer) perform health checks on backend EC2 instances. If an instance fails a health check, the load balancer marks it as unhealthy. The Auto Scaling group then launches a replacement instance, and the unhealthy instance is removed from the target group. This process helps maintain the desired capacity and ensures application availability.
Components: Auto Scaling Group, Scaling Policies, CloudWatch. Process: 1. CloudWatch monitors EC2 instance metrics (CPU, memory, network, etc.). 2. If a metric crosses a predefined threshold (e.g., CPU > 20%), CloudWatch triggers an alarm. 3. The CloudWatch alarm notifies the relevant Auto Scaling policy. 4. The Auto Scaling policy instructs the Auto Scaling group to perform a scaling action (e.g., add an instance). 5. The Auto Scaling group launches a new EC2 instance. This process works in reverse for scaling in (removing instances).
A period of time after an auto-scaling action (adding or removing an instance) during which no further scaling actions are triggered. Purpose: Prevents rapid or repetitive scaling due to quick metric fluctuations and gives newly launched instances time to start up and stabilize. Types: Default Cooldown (applied automatically), Custom Cooldown (user-defined). Default value is 300 seconds (5 minutes) in the console for simple scaling policies.
Technical Specs: Default cooldown period: 300 seconds (5 minutes)
Allows time for new instances to become fully operational before they are considered in metric calculations for scaling decisions. Used with Step Scaling and Target Tracking Scaling. During this configurable window, new instances are not counted against health checks.
Lifecycle hooks allow you to perform actions before an instance is launched (scale-out) or terminated (scale-in). This is useful for tasks like software updates or data backups.
These policies determine which instances are terminated during a scale-in event. Options include: Default, OldestInstance, NewestInstance, OldestLaunchTemplate, ClosestToNextInstanceHour. The Default Termination Policy prioritizes Availability Zone distribution, then instances from the oldest launch template/configuration (configurations first), then instances closest to their next billing hour.
Prevents instances from being terminated during scale-in events. Can be enabled on creation or changed on running instances. New instances launched after enabling will have protection. Note: Instance scale-in protection does not prevent manual termination.
Specifies the minimum amount of time to keep a new instance in service before terminating it, even if it fails health checks. This prevents unnecessary termination of newly launched instances that may not immediately pass health checks. Default is 300 seconds in the console, but 0 seconds when using the AWS CLI or SDK. Setting a high value reduces the effectiveness of health checks.
Technical Specs: Default grace period: 300 seconds (console), 0 seconds (CLI/SDK)

EC2 Auto Scaling Demonstration: Setup and Configuration

procedure

This section details the process of setting up and demonstrating EC2 autoscaling, including the creation of necessary AWS resources.

The primary goal is to show how EC2 instances automatically scale based on demand and health checks, using a Launch Template, Auto Scaling Group, Target Group, and Application Load Balancer.

Prerequisites

  • Existing Security Groups (CS demo security group, CS LB demo security group)
  • Amazon Linux 2023 AMI
  • T2 micro instance type
1

Create a Launch Template

💡 Defines the configuration for new EC2 instances launched by the autoscaling group.

2

Create an Autoscaling Group

💡 Manages the collection of EC2 instances, ensuring a desired number are running and scaling as needed.

3

Observe Initial Instance Launch

💡 The ASG launches one EC2 instance based on the desired capacity, accessible via its public IP.

4

Create a Target Group

💡 Routes traffic to registered targets (EC2 instances).

5

Create an Application Load Balancer (ALB)

💡 Distributes incoming application traffic across multiple targets.

6

Update the Autoscaling Group to Attach ALB and Target Group

💡 Connects the ASG to the ALB for traffic distribution.

7

Access Application via ALB's DNS Name

💡 Verifies that traffic is routed through the ALB to the backend instance.

8

Simulate Outage Scenario

💡 Demonstrates ASG's fault tolerance: stopping an instance, ASG detects unhealthy target and launches a replacement.

9

Simulate Instance Recovery

💡 Demonstrates ASG's capacity management: starting the original instance, ASG observes desired capacity met, terminates redundant instance.

Integration with Other AWS Services

EC2 Auto Scaling works seamlessly with various AWS services to build robust, scalable, and highly available architectures.

ELB distributes incoming traffic across multiple EC2 instances managed by an Auto Scaling group, ensuring high availability and fault tolerance. ELB also performs health checks, and if an instance fails, the Auto Scaling group replaces it.
CloudWatch monitors EC2 instance metrics (CPU, memory, network, etc.) and triggers alarms when predefined thresholds are crossed. These alarms notify Auto Scaling policies to initiate scaling actions (scale out/in).
Auto Scaling can use a target tracking scaling policy based on a custom Amazon SQS queue metric, like 'backlog per instance'. This helps dynamic scaling adjust to the demand curve of applications processing messages from a queue, ensuring performance during sudden spikes in orders.
Route 53, a highly available DNS service, directs users to applications, often to an ELB that fronts an Auto Scaling Group. While Route 53 itself has routing options like failover, the ASG handles the underlying EC2 instance scaling.
For ElastiCache deployment options, specifically 'Provisioned (Self-Designed)', you can configure autoscaling policies for the cache nodes.
Elastic Beanstalk is an orchestration service that provisions resources including load balancing and Auto Scaling, and configures monitoring. For 'Load Balancing and Auto Scaling' environment types, Auto Scaling handles launching, configuring, and terminating instances. Immutable deployments use temporary ASGs.
DynamoDB, a non-relational database service, offers two capacity models: provisioned and on-demand. The provisioned model is best for predictable workloads where auto scaling can be used to adjust read and write capacity units (RCUs/WCUs) as needed.

Advanced Configuration and Best Practices

Implementing best practices and advanced configurations optimizes EC2 Auto Scaling for performance, cost, and reliability.

Use a 1-minute frequency for CloudWatch metric data collection for faster response times. Turn on Auto Scaling Group metrics for accurate capacity forecasting. Avoid burstable performance instance types (like T2 and T3) to prevent performance limitations. Bake everything inside of the AMI to reduce the provisioning time of instances.
Use Reserved Instances (RIs) for baseline usage and Spot Instances for additional, variable capacity to minimize EC2 costs without affecting availability. On-demand instances are flexible but not cost-optimized for predictable loads.
Cloud environments offer elasticity, allowing you to scale resources up or down automatically based on demand. This eliminates the need for upfront capacity estimations and optimizes resource utilization. You can monitor demand and system usage to dynamically adjust capacity, ensuring optimal performance and cost efficiency.
A steady-state group can have a minimum, maximum, and desired capacity all set to 1. This is used for high availability of critical resources or legacy resources where only one resource can be online at a time.
Technical Specs: Min Capacity: 1, Max Capacity: 1, Desired Capacity: 1
To troubleshoot an instance, put it into Standby mode. This allows the group to scale up if needed, and then you can troubleshoot the instance and put it back into the InService state when complete.

Auto Scaling Prediction Challenge: Understanding EC2 Instance Scaling

Understanding how various CPU utilization levels and instance warmup periods affect the number of running instances within an Auto Scaling group with step scaling policies.

Maximum capacity: 20 instances. Desired capacity: 10 instances. Minimum capacity: 5 instances. Instance warmup period is 5 minutes. Policies include scale-in (remove 1-2 instances) and scale-out (add 1-2 instances) actions based on average CPU utilization thresholds.
Technical Specs: Max: 20, Desired: 10, Min: 5; Warmup: 5 minutes
Remove one instance when 40% > average CPU > 20% for more than 2 minutes. Remove two instances when 20% > average CPU > 0% for more than 2 minutes.
Add one instance when 60% < average CPU < 80% for more than 2 minutes. Add two instances when 80% < average CPU < 100% for more than 2 minutes.
If CPU Utilization is 63-70% for > 2 minutes, one instance is added, but the total instance count remains unchanged until the 5-minute warmup period ends.
If CPU Utilization remains 60-80% for >2 minutes after Condition #1, another scale-out action is triggered. However, no additional instance will be added due to the active warmup period from Condition #1.
If CPU Utilization Reaches 85% (within 5 minutes of Condition #1), a scale-out action to add two instances is triggered. However, only one additional instance will be launched because one instance is already warming up.

Exam Tips

Glossary

Auto Scaling Group (ASG)
A collection of EC2 instances treated as a logical grouping for the purpose of auto-scaling and management.
Launch Template
Specifies the configuration details for launching EC2 instances (AMI, instance type, key pair, security groups, etc.) and is the AWS-recommended method over Launch Configurations.
Launch Configuration
An older, deprecated method for defining the specifications for new EC2 instances launched by Auto Scaling; it has limitations and cannot be edited after creation.
Scaling Out
Increasing compute capacity by adding more EC2 instances when demand rises.
Scaling In
Decreasing compute capacity by removing EC2 instances when demand falls.
Cooldown Period
A period of time after an auto-scaling action during which no further scaling actions are triggered, preventing rapid or repetitive scaling.
Warmup Period
A configurable window where new instances are not counted against health checks or considered in metric calculations, allowing them time to become fully operational.
Thrashing
Occurs when instances are constantly added and removed due to rapid fluctuations in metrics, leading to inefficient scaling.
Desired Capacity
The target number of EC2 instances an Auto Scaling group aims to maintain at a given point in time for steady-state workload.
Target Tracking Scaling
A scaling policy that maintains a specified metric (e.g., CPU utilization) at a target value by adjusting instance count.
Step Scaling
A scaling policy where capacity adjustments are made in steps based on how far a metric is from a threshold, allowing granular control.
Predictive Scaling
A scaling policy that uses machine learning to forecast future traffic and proactively adjust capacity, preventing scaling issues.
Instance Scale-in Protection
A setting that prevents specific instances from being terminated during scale-in events within an Auto Scaling Group.

Key Takeaways

Content Sources

EC2 Auto Scaling: Core Concepts and F... EC2 Autoscaling Demonstration: Setup ... Introduction to Amazon ElastiCache Introduction to Amazon ElastiCache 07_AWS_Solutions_Architect_Associate_... Extracted: 2026-01-26 09:22:06.895895 Model: gemini-2.5-flash